Homework 7

Chosen dataset: Transcriptome Analysis Reveals Non-Foamy Rather Than Foamy Plaque Macrophages Are Proinflammatory in Atherosclerotic Murine Models

Reading the data

First of all, we need to create an AnnData object.

Filtering

Now we are going to filter out cells that have less than 200 genes and genes that are present in less than 3 cells.

Next we will calculate the percentage of mitochondrial genes in cells. Extensive mitochondrial contamination is often present in dying/low-quality cells.

As we can see on the plots, we can filter out n_genes_by_counts < 1000 & > 5000 and pct_counts_mt > 5%.

Now plots look good.

Normalizing and scaling the data

Now we will normalize the data, so that counts become comparable among cells.

Logarithmize the data:

Identification of highly variable genes.

Calculating and focusing on a subset of features that exhibit high cell-to-cell variation in the dataset can help to highlight biological signal.

Flitering highly expressed genes

Regressing out the effects of total counts per cell and the percentage of mitochondrial genes expressed.

Scaling each gene to unit variants.

PCA

Calculating PCA:

Based on this elbow plot we can take about 30 PCs for further analysis.

Computing the neighborhood graph

Embedding the neighborhood graph

We are going to use UMAP method.

Clustering the neighborhood graph

We are going to use the Leiden graph-clustering method.

Let's look at clusters in the related publication. They look kind of similar, but we should look at how cell of these clusters express gene markers in order to see correlation. Also UMAP method was used instead of t-SNE, which partially explains the difference.

umap

Comparison with the publication

Let's look at the expression of principal hematopoietic markers in 11 identified cell clusters.

As we can see the expression of Fcrg1 (a macrophage marker gene) is very similar between the publication and this analysis. It is highly expressed in 0-5, 7 and and especially 8, which corresponds to the publication.

Another macrophage marker gene Mafb is also highly expressed in clusters 0-5, 7 and 8.

Lyve1 is highly expressed in clusters 0, 4 and 7.

Mrc1 is highly expressed in clusters 0,4,5 and 7. It suggests that those clusters can be resident adventitial macrophages.

Cluster 9 had a high expression of Cd3e, which is coding a T-cell receptor.

Itgax is highly expressed in clusters 3 and 6, which corresponds to the publication.

Clusters 6 and 10 showed high-level expression of Flt3 gene, which is required for dendritic cells differentiation.

Expression of Abcg1 gene corresponds to the publication.

Now let's look at the heatmap.

Conclusion: Upon comparison with the related publication we can conclude that the results are very similar to the corresponding plots, except the order of the cluster (e.g. clusters 3 and 4 are seemed to be swapped). This can be attributed to the different number of PCs used for detecting neighbours and other parameters used in the analysis.